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The anaerobic, thermophilic bacterium, Clostridium thermocellum, secretes multi-protein 
enzyme complexes, termed cellulosomes, which synergistically interact with the microbial 
cell surface and efficiently disassemble plant cell wall biomass. C. thermocellum has also 
been considered a potential consolidated bioprocessing (CBP) organism due to its ability 
to produce the biofuel products, hydrogen, and ethanol. We found that C. thermocellum 
fermentation of pretreated yellow poplar (PYP) produced 30 and 39% of ethanol and 
hydrogen product concentrations, respectively, compared to fermentation of cellobiose. 
RNA-seq was used to analyze the transcriptional profiles of these cells. The PYP-grown 
cells taken for analysis at the late stationary phase showed 1211 genes up-regulated 
and 314 down-regulated by more than two-fold compared to the cellobiose-grown cells. 
These affected genes cover a broad spectrum of specific functional categories. The 
transcriptional analysis was further validated by sub-proteomics data taken from the 
literature; as well as by quantitative reverse transcription-PCR (qRT-PCR) analyses of 
selected genes. Specifically, 47 cellulosomal protein-encoding genes, genes for 4 pairs 
of Sigl-Rsgl for polysaccharide sensing, 7 cellodextrin ABC transporter genes, and a set 
of NAD(P)H hydogenase and alcohol dehydrogenase genes were up-regulated for cells 
growing on PYP compared to cellobiose. These genes could be potential candidates for 
future studies aimed at gaining insight into the regulatory mechanism of this organism as 
well as for improvement of C. thermocellum in its role as a CBP organism. 
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INTRODUCTION 

Microbial conversion of biomass to biofuels is an attractive route 
for biofuel development, but an essential challenge is to increase 
the microbial capacity both for overcoming the biomass recal- 
citrance and for converting the biomass-derived sugars to bio- 
fuels (Himmel et al, 2007; Alper and Stephanopoulos, 2009). 
Clostridium thermocellum, a gram-positive, thermophilic, anaer- 
obic bacterium, is one of the model consolidated bioprocessing 
(CBP) systems used to study the enzymatic hydrolysis of cellu- 
losic biomass to produce fuels (Islam et al., 2006; Brown et al., 
2012; Yang et al., 2012). The following features of C. thermocellum 



Abbreviations: Abf, ot-N-arabinofuranosidase; ADH, alcohol dehydrogenase; 
CBO, cellobiose-grown cells only; Cbp, carbohydrate-binding protein; CBP, con- 
solidated bioprocessing; CDP, cellodextrin phosphorylase; CEP, cellobiose phos- 
phorylases; COG, clusters of orthologous groups; emPAl, exponentially modified 
protein abundance index; FC, fold changes; Fdred, reduced ferredoxin; GH, gly- 
coside hydrolases; NSAF, normalized spectral abundance factor; PF, passing filter; 
PTA, phosphotransacetylase; PYP, pretreated yellow poplar; PYPO, PYP-grown 
cells only; qRT-PCR, quantitative reverse transcription-PCR; RPKM, reads per kilo- 
base of exon model per million mapped reads; Rsgl, Rsgl-like anti-sigma factors; 
Sigl, sigma factor. 



contribute to its suitability as a model cellulolytic, biofuel- 
producing bacterium: (1) ft produces cellulosomes, a type of 
highly organized multi-protein enzyme complexes, which have 
been shown to be highly efficient enzyme systems in decon- 
structing plant cell wall, especially in degrading the recalcitrant 
substrate crystalline cellulose (Bayer et al, 1998, 2004). (2) It 
carries out mixed-product fermentation, producing ethanol, H2 
and numerous organic acids including acetate, formate, and lac- 
tate (Demain et al, 2005; Islam et al., 2006). (3) It is suitable for 
both submerged and solid-state fermentation (Bayer et al., 2007), 
the latter having similarity to compost in the set-up of feedstock 
(Wei et al, 2012). (4) Its genome sequence is available (http:// 
genome.jgi-psf.org/cloth/cloth.info.html) and many of the cel- 
lulolytic enzymes are identified and biochemically characterized. 
The knowledge gained from studies of this species will benefit 
work on other clostridial species of industrial interest, such as 
C. acetobutylicum, known to produce the potential fuels acetone, 
ethanol, and butanol (Cooksley et al., 2012). 

Fulfilling this potential will require a more in-depth under- 
standing of the metabolic and genetic mechanisms by which 
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C. thermocellum utilizes recalcitrant biomass substrate. So far, a 
number of studies have been carried out on C. thermocellum, 
such as transcriptomic analysis of stress responses to ethanol, fur- 
fural, and heat during growth on pure sugars (i.e., cellobiose) 
(Yang et al, 2012), time course studies of cell growth on pure 
crystalline cellulose (Avicel) (Raman et al., 2011), and compar- 
isons of growth on pure crystalline cellulose (Sigmacel 50) and 
cellobiose (Riederer et al., 2011). However, despite reports on 
the sub-proteomic analyses of cellulosomes from C. thermocellum 
grown on Avicel (Gold and Martin, 2007) and on combinations 
of Avicel with pectin and/or xylan, or on pretreated switchgrass 
(Raman et al., 2009), there were no genome-wide transcrip- 
tomic studies reported for growth on pretreated woody plant 
biomass until most recently a transcriptomic analysis comparing 
C. thermocellum cells grown on pretreated switchgrass and woody 
biomass, cottonwood (Populus trichocarpa x Populus deltoids; 
black cottonwood x eastern cottonwood in common names), was 
published on December 2, 2013 (Wilson et al., 2013a), after the 
submission of this manuscript. 

Woody biomass has been found to be more recalcitrant to 
enzyme digestion than is herbaceous biomass. For example, 
whereas the Recalcitrance Index for switchgrass is ~0.35, the 
index for hardwoods, such as yellow poplar, is 0.56, indicating 
that hardwoods are more difficult to be degraded (Wei et al., 
2009). Previous studies have shown that cellulolytic bacteria 
grown on different lignocellulosic substrates have different lev- 
els of glycoside hydrolase (GH) activities (Irwin et al., 2003). As 
such C. thermocellum grown on pretreated woody plant biomass 
is likely to have distinctive responsive genes, as well as different 
composition of cellulolytic enzymes different from those grown 
on herbaceous biomass. Recently, we found that species of the 
genus Clostridium (including C. thermocellum) were among the 
dominant species, comprising 6.3% of the total, in an anaero- 
bic community decomposing yellow poplar wood chips (van der 
Lelie et al, 2012). In parallel, we studied the yellow poplar com- 
post system (Wei et al., 2012), in which Clostridium was also 
found to be one of the dominant bacteria (data not shown). These 
results prompted us to explore C. thermocellum grown on dilute 
acid-pretreated yellow poplar (PYP) as a single-species model 
for the plant cell wall-degradation. PYP is a widely- utilized feed- 
stock in process-development for conversion of biomass to fuels 
and chemicals, and it is important to identify specific enzymes 
that this organism calls into action to attack this form of the 
feedstock/substrate. 

PYP has been previously demonstrated that 60% conversion 
to simple sugars can be achieved with a loading of ~8.4 mg per 
g biomass cellulose of a mixture of Trichoderma reesei CBHI and 
Acidothermus cellulolyticus El (95%: 5% on molar basis) (Vinzant 
et al., 1994; Baker et al, 1997). In this study, C. thermocellum 
was bench fermented using PYP and cellobiose as sole carbon 
sources, respectively, and transcriptional profiles were analyzed. 
RNA-seq is a recently developed technology for transcriptome 
profiling, which uses next-generation sequencing to reveal the 
presence and quantity of RNA in biological samples. The goals 
of this study are two-fold: first, we identified genes responsive 
for degradation of recalcitrant biomass substrates in PYP- vs. 
cellobiose-grown C. thermocellum cells. Secondly, we specifically 



focused on candidate genes related to cellulosome, cellodextrin 
transport, polysaccharide signal transduction and end-product 
synthesis related genes. These candidate genes are likely to be 
valuable for mechanism study; as well as for protein-engineering 
to further improve the abilities of this already potent organism to 
degrade the cell walls of recalcitrant biomass feedstocks. 

MATERIALS AND METHODS 

Figure 1A shows the overall experimental approach we designed 
to investigate the C. thermocellum utilization of pretreated 
biomass substrates as reflected by changes in the cell's transcrip- 
tome. Details of the experimental approach are described in the 
following sections. 

CARBON AND LIGNOCELLULOSIC SUBSTRATES 

Cellobiose and other chemical compounds were purchased from 
Sigma (St. Louis, MO). PYP was prepared as described previously 
(Tucker et al., 1998). Briefly, the milled yellow biomass (20% 
solids loading) was pretreated in 0.21% w/w H2SO4 at 200°C for 
4min. The resultant PYP contained ~65% cellulose, 4% xylan, 
and 31% lignin (dry weight basis). PYP was exhaustively washed 
with deionized water until the pH reached that of deionized water, 
prior to being used in medium preparation as described below. 

MICROORGANISMS AND CULTURE CONDITIONS 

C. thermocellum ATCC 27405 was routinely cultured at 60°C 
anaerobically in 30-mL serum bottles containing 10 mL of ATCC 
medium 1191 containing cellobiose, and was subcultured with 
2% inoculum taken at exponential growth phase. For this study, 
similarly to practices in the literature (Gold and Martin, 2007; 
Rydzak et al., 2009), 100-mL batch culture in 250-mL serum 
bottles was used for the growth of the strain anaerobically at 
58°C in ATCC medium 1191, containing 0.30% (w/v) cellobiose 
as the control carbohydrate substrate, or 0.44% PYP as biomass 
substrate (sugar-content equivalent to 0.30% cellobiose) with an 
agitation of 130 rpm. Cell growth was monitored based on either 
the measurement of optical density (OD) by spectrophotometry 
at 600 nm (OD600) for cellobiose-grown culture, or the increase in 
pellet protein amount for PYP-grown culture using the method 
described in literature (Raman et al, 2009). Three independent 
sets of PYP- vs. cellobiose-grown cell cultures, harvested at the 
late stationary phase (20 h for cellobiose-grown cells and 36 h 
for PYP-grown cells), were carried through the downstream RNA 
extraction preparations. Two sets of RNA preparations were used 
for cDNA library construction and RNA-Seq, whereas all three 
sets of RNA preparations were used for qRT-PCR analysis in 
verifying RNA-Seq data of selected genes. 

MEASUREMENT OF H 2 AND ETHANOL, AND ISOLATION OF BACTERIAL 
TOTAL RNA 

Cells were harvested for RNA isolation in late stationary phase 
after the hydrogen concentration at the headspace was analyzed 
using GC method, wherein an Agilent 7890A GC equipped with 
a Supelco 60/80 molsieve 5A column was used for the mea- 
surement of H2. The cell culture was centrifuged at high speed 
(8000 x g) for 5 min to pellet the bacterial cells. Cell-free cul- 
ture supernatants were used to measure ethanol concentration by 
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FIGURE 1 | Flowchart for experimental procedures, along with growth 
curves for cellobiose- and PYP-grown C. thermocellum cells. (A) 

Flowchart outlining the experimental procedures; RPKM, reads per 
kilobase of exon model per million mapped reads. (B) OD600 for 
cellobiose-grown cells was measured by spectrophotometry at 600 nm 



(OD600). (C) Cell pellet protein concentration for pretreated yellow poplar 
(PYP)-grown cell culture. The doubling time for cellobiose- and PYP-grown 
cells was 3.3 and 7.8 h, respectively. Data are presented as the mean 
(±SE) of 3 replicates. Arrows indicate the time points sampled for 
transcriptomic analyses. 



using HPLC (Agilent) with a refractive index detector (Shimadzu, 
Kyoto, Japan). All samples were filtered through a 0.45-u.m fil- 
ter before HPLC analysis. The organic acids were separated in an 
Aminex HPX-87H column (Bio-Rad) running at a flow rate of 
0.6 ml/min at 50°C, with 4 mM H 2 S0 4 as eluent. 

The cell pellets collected from 10 mL culture were used for 
RNA extraction, using a combined protocol (Yang et al., 2012), 
in which the TRIzol (Invitrogen, Carlsbad, CA) extraction aque- 
ous phase was mixed with an equal volume of 70% ethanol, and 
then applied to the column of Qiagen RNeasy Mini Kit (Qiagen, 
Valencia, CA) for purification according to the manufacturer's 
instructions to obtain total RNAs. 7.5 u,g total RNAs were mixed 
with 3 |xL DNase I (Invitrogen; 1 U/(xL) in 3 u,L lOx DNase I 
reaction buffer, with adding RNase-free water added up to a 
total volume of 30 |xL, and incubated at room temperature for 
15min. The DNase I was inactivated by adding 3 u,L of 25 mM 
EDTA and heating for 10 min at 65° C. The DNase I treated total 
RNA was precipitated by the ethanol-glycogen method and re- 
dissolved in 15 (i,L of 1 mM EDTA. Since the total RNAs contains 
75% rRNA (Chen and Duan, 2011), mRNA was enriched by 
using the MICROBExpress Bacterial mRNA Enrichment Kit (cat 
# AM1905, Ambion, Life Technologies, Austin, TX) to remove 16 
and 23 s rRNAs. The resultant RNAs were quantified and analyzed 
for integrity on the Agilent 2100 Bioanalyzer, and used for cDNA 
library construction as described below. 

cDNA LIBRARY CONSTRUCTION, SEQUENCING, AND READ MAPPING 

cDNA libraries were constructed with the RNA-Seq sample 
preparation kit (RS-100-0801, Illumina, San Diego, CA), using 
the procedure provided by the manufacturer. Briefly, the above 
enriched mRNA samples were fragmented, and then annealed 
to random hexamers and reverse transcribed. After second 
strand synthesis, end repair, and A-tailing, cDNA fragment 
ends were ligated to adapters that were complementary to 



sequencing primers. Resultant cDNA libraries were size sepa- 
rated on agarose gels with ~200 bp fragments being excised, and 
amplified by 15 cycles of PCR. The prepared cDNA libraries 
were sequenced on an Illumina Genome Analyzer II by the 
Michigan State University DNA facility, using the standard pro- 
tocols and running for 75 cycles of data acquisition. Solexa 
reads were aligned to the reference genome sequence of C. ther- 
mocellum strain (ATCC 27405) deposited at NCBI (Accession: 
NC_009012.1) using Bowtie (Langmead et al, 2009). The ref- 
erence genome is 3.8-Mb long with 3305 predicted genes, 
including 71 structural RNA genes, according to the NCBI 
website for the genome (http://www.ncbi.nlm.nih.gov/genome/? 
term=Clostridium-r- thermocellum) accessed on April 28, 2010. 
The best end-to-end alignment (ties broken by read quality) was 
used with no more than two mismatches. The reads that cannot 
be unambiguously mapped were not included for further analysis. 
The above alignment approach and criteria were similar to those 
used in literature (Jia et al, 2009; Lin et al, 2011). 

GENE EXPRESSION NORMALIZATION AND IDENTIFICATION OF 
DIFFERENTIALLY EXPRESSED GENES 

After read mapping, the midpoint of the read-reference align- 
ment was used to determine which gene that read belongs to (or 
was derived from), and the RNA-seq read counts for each gene 
can be then calculated. To facilitate comparison of gene expres- 
sion levels, both within and between two samples, we quantified 
and normalized gene expression levels by calculating the reads 
per kilobase of exon model per million mapped reads (RPKM), 
i.e., calculating the number of reads mapped per kilobase of 
exon model divided by the total number of mapped reads in 
the whole dataset (Mortazavi et al, 2008). As an indicator of 
gene expression level, the RPKM is widely recognized as a mea- 
sure of Solexa read density that reflects the molar concentration 
of a transcript in the starting RNA sample, thus making the 
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normalized gene expression levels comparable within and among 
samples. 

The fold-change for the expression of an individual gene was 
calculated by taking the ratio of RPKM in PYP-grown cells to that 
in cellobiose-grown cells (RPKMpYp/RPKM ce u 0 bi ose )- The cutoff 
value for defining a gene as "differentially expressed" is either 
a two-fold increase (with the value of RPKMpYp/RPKM ce u 0 bi 0 se 
larger than 2.0), or a two-fold decrease (with the value of 
RPKMp Y p/RPKM cellobiose less than 0.5). 

STATISTICAL ANALYSIS 

The log2-transformed raw RPKM dataset was imported into the 
statistical analysis software JMP Genomics 6.0 software (SAS 
Institute, NC), and the data of PYP- and cellobiose-grown 
cell samples were normalized together using the LOWESS nor- 
malization algorithm within JMP Genomics. To determine if 
the differential expression levels between PYP- and cellobiose- 
grown cell samples were statistically significant, the normalized 
Log2(RPKM) data were subjected to One -Way analysis of vari- 
ance (ANOVA) as described in literature (Yang et al, 2012, 2013; 
Wilson et al., 2013b). The False Discovery Rate (FDR) testing 
method was used with a significance threshold of p < 0.05 being 
considered statistically significant. 

BIOLOGICAL INTERPRETATION OF DIFFERENTIALLY EXPRESSED GENES 

The identified, differentially expressed genes were interpreted and 
discussed in the context of biological processes and functions, 
using clusters of orthologous groups (COG) and Carbohydrate- 
Active enZYmes (CAZy) analyses of proteins (www.cazy.org/). 

QUANTITATIVE REVERSE TRANSCRIPTION-PCR (qRT-PCR) 

Based on their potential functional importance, 12 genes were 
selected for validating the results of the RNA-Seq analysis. The 
primers for these genes were either based on literature or designed 
with Primer Express 3.0 software (Applied Biosystems), and 
are described along with the PCR results in the Results and 
Discussion section. Total RNA was extracted from three sets of 
independent cultures grown on PYP vs. cellobiose as described 
above, and then converted to cDNA by random priming, using 
the Superscript II kit (Invitrogen, San Diego, CA). PCR reactions 
were run in triplicate using procedure as previously described 
(Wei et al., 2012). The transcription level of genes was determined 
according to the 2~ AACT method, using RecA as a reference gene 
for the normalization of gene expression levels (Stevenson and 
Weimer, 2005). 

CORRELATION ANALYSIS FOR CELLUL0S0ME-RELATED GENE 
EXPRESSION AND PROTEIN ABUNDANCE 

Two sets of quantitative cellulosomal protein data of cellobiose- 
grown C. thermocellum cells at late stationery phase were retrieved 
from literature (Gold and Martin, 2007; Raman et al., 2009), and 
plotted against our sub-dataset of log2(RPKM) values of cellu- 
losomal genes in cells grown on the same substrate at the same 
culture phase in this study. Pearson correlation coefficient values 
were calculated using Microsoft Excel (Microsoft Corporation, 
Redmond, WA, USA), and used as an indicator for the degree of 
correlation for the compared pairs. 



COMPOSITIONAL ANALYSIS OF PRETREATED YELLOW POPLAR 
RESIDUES 

The PYP residues from PYP-grown C. thermocellum culture were 
collected by centrifugation at low speed (100 x g) for 2min to 
precipitate the insoluble substrate not consumed by the bacterial 
cells in the culture. Such centrifugation speed has been used in the 
literature to remove any insoluble substrate in C. thermocellum 
culture (Dror et al., 2005). Compositional analysis of the collected 
PYP residue was performed by the National Bioenergy Center, 
National Renewable Energy Laboratory, using method described 
in literature (Templeton et al, 2010). 

RESULTS AND DISCUSSION 

C. THERMOCELLUM GROWTH AND THE PRODUCTION OF HYDROGEN 
AND ETHAN0L 

The first step of this study, as illustrated in Figures 1A-C, is 
the growth of C. thermocellum ATCC 27405 with two types of 
carbohydrate substrates, PYP being compared with cellobiose. 
GC analysis of gas composition in the headspace of batch cul- 
tures at the late stationary phase revealed a production yield of 
1.22 vs. 0.92 mole H2/mole glucose consumed in cellobiose- and 
PYP-grown C. thermocellum, respectively (Table 1). Furthermore, 
HPLC analysis of metabolite production in supernatants of har- 
vested cell culture revealed a production yield of 0.51 vs. 0.30 
mole ethanol /mole glucose equivalent consumed in cellobiose- 
and PYP-grown C. thermocellum, respectively. Overall, the pro- 
duction of hydrogen and ethanol by PYP-grown C. thermocellum 
are 75 and 58% of those by cellobiose-grown C. thermocellum, 
respectively (Table 1). Wet chemistry analysis of spent PYP solids 
revealed that 52% of the glucan in the starting PYP substrate was 
consumed during the fermentations, which can partially explain 
the lower absolute H2 and ethanol productions compared to the 
culture on cellobiose substrate (in which nearly all cellobiose was 
depleted at late stationary phase). 

RNA-Seq RESULTS 

The quality of RNA-Seq cDNA libraries was assessed on a 
Bioanalyzer prior to GA II (Figure 2). The results showed the size 
distribution of cDNA library was between 200 and 300 bp, which 
met the requirement for Solexa sequencing. 

For the two PYP-grown cell cDNA library samples, Solexa 
sequencing from the first cDNA library generated 20.8 million 
total raw reads, with 14.5 million passing filter reads (70% PF 
reads); from the second cDNA library generated 19.6 million total 
raw reads, with 14.1 million passing filter reads (72% PF reads). 
Out of these PYP-grown cell derived reads, 2.3 and 2.1 millions 
reads were unambiguously mapped to C. thermocellum genome 
with the criteria set in the Materials and Methods section, respec- 
tively. For the two cellobiose-grown cell cDNA library samples, 
Solexa sequencing from their first cDNA library generated 21.1 
million total raw reads, with 15.7 million passing filter reads (74% 
PF reads); from the second cDNA library generated 19.7 mil- 
lion total raw reads, with 14.0 million passing filter reads (71% 
PF reads). Out of these cellobiose-grown cell-derived reads, 2.6 
and 2.2 millions reads were unambiguously mapped to C. ther- 
mocellum genome, respectively. The RPKM value for each gene 
in each condition was calculated as RPKM = Reads number 
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Table 1 | Hydrogen, ethanol and acetate production at the late stationary phase of C. thermocellum culture. 
Cell culture Yields 



Hydrogen 



Ethanol 



Acetate 



Acetate/Ethanol 



(Mole product/Mole glucose equivalent consumed) 



Cellobiose-grown cells 1.22 ±0.11 0.51 ± 0.05 

PYP-grown cells 0.92 ± 0.09 0.30 ± 0.04 

Yield in PYP-grown cells/yield in cellobiose-grown cells (%) 75 58 



0.55 ± 0.07 
0.27 ± 0.04 
49 



1.08 
0.92 



The time point for measurements was 20 and 36 h for cellobiose- and PYP-grown cells, respectively. Values for the yields of hydrogen, ethanol and acetate are 
means ± s.e.m. of three independent experiments. PYP acid-pretreated yellow poplar. 



bp 
(base 
pairs) 

1500 - 

1000 - 

700 - 

500 - 

400 - 

300 - 

200 - 

150 - 

100 - 

50 - 

15 - 



Q 07 



o 0 £ 



a 

a 



[FU]- 
40- 
20- 
0- 



[FU]. 

40- 
20 H 

0 



Cellobiose-grown cells 



iii i i i 1 — i — i — m 

15 100 200 300 500 1500 [bp] 



K 



PYP-grown cells 



Electrophoresis run 
(200-300 bp PCR product) 
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USA). The RNA-Seq cDNA libraries were prepared from RNA extracted from 
C. thermocellum cells grown on cellobiose or PYR (A) Electrophoresis run of 
the cDNA libraries by the Bioanalyzer; (B) bioanalyzer trace of cDNA libraries. 



mapped to gene/Length of the gene (kb)/Total reads number 
(million reads), as described in the Materials and Methods 
section. 

Overall, sequence analysis successfully aligned the Solexa reads 
to 3081 protein-coding genes and 61 structural RNA genes in 
cellobiose- and/or PYP-grown C. thermocellum mRNA samples, 
accounting for 95% [i.e., (3081 + 61)/3305] of all C. thermocel- 
lum genes in the genome, indicating that RNA-Seq analysis in 
this study achieved comprehensive coverage of the C. thermo- 
cellum transcriptome. In addition, the above result is consistent 
with the reports that in the genomes of other microorgan- 
isms such as Saccharomyces cerevisiae and S. pombe, more than 
90% of the genes are transcriptionally active and expressed 
(Nagalakshmi et al, 2008; Wilhelm et al, 2008; Wang et al., 
2009). The detailed lists of genes for the RNA-Seq sequences 
identified in the cellobiose- and PYP-grown cells are presented 
in Supplementary Data Sheet 1. Each of the RPKM values of 
cellobiose-grown cells and RPKM of PYP-grown cells was the 
average of two replicates. 



Our data from Solexa-read transcriptome measurement of 
PYP- vs. cellobiose-grown C. thermocellum illustrate some key 
characteristics of the results. First, the obtained RPKM val- 
ues for most of the transcripts of the active protein-coding 
genes in PYP- and cellobiose-grown C. thermocellum, are in 
the range of 1-50,000 (which can be revealed by re-sorting the 
Supplementary Data Sheet 1). Such range of RPKM values is 
comparable to the reported range of RPKM values in RNA-Seq 
whole-transcriptome analysis of other organism samples (Tang 
et al, 2009). 

The gene expression levels can be classified into four levels: 
low, moderate, high, and very high, as illustrated in the X axis 
in Figure 3. While the RNA-Seq data sets of PYP- vs. cellobiose- 
grown cells had comparable numbers of genes that fall in the 
moderate expression levels (i.e., both 40% of the active protein- 
encoding genes), the PYP-grown cells had more highly expressed 
genes (41 vs. 33% in that of cellobiose-grown culture; Figure 3). 
In contrast, cellobiose-grown cells had more lowly expressed 
genes (26 vs. 18% in that PYP-grown culture). 
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FIGURE 3 | RPKM frequency histogram of transcripts from RNA-Seq of 
pretreated yellow poplar (PYP)- vs. cellobiose-grown C. thermocellum 
cells. The diagram shows the distribution of the number of genes 
expressed at different RPKM levels. The total number of protein coding 
genes aligned with the RNA-Seq was 3081; the percentage value above 
each bar indicates the genes at specific expression level accounting for the 
proportion of total number of genes. The * mark indicates that significantly 
different frequencies (i.e., numbers of genes) were observed between the 
two RNA-Seq data sets from PYP- vs. cellobiose-grown C. thermocellum 
cells. 



CHANGES IN GLOBAL GENE EXPRESSION AND CLUSTERS OF 
0RTH0L0G0US GROUPS (COG) ANALYSIS 

To compare the differential expression of genes between PYP- 
and control cellobiose-grown cells, fold changes (FC) were com- 
puted as the ratio of the RPKM values obtained for individual 
genes in PYP- against cellobiose-grown cells. Analysis of changes 
in global gene expression had identified 1211 genes that show 
a two-fold or greater increase in expression (i.e., FC > 2.0) or 
detected in PYP-grown cells only (referred as PYPO genes), and 
314 genes with a two-fold or greater decrease (i.e., FC < 0.5) in 
transcript abundance for PYP- against cellobiose-grown cells or 
detected in cellobiose-grown cells only (referred as CBO genes), 
as listed in Supplementary Data Sheet 2 with related statistic 
analyses showing that they were statistically significant. 

The COG distribution for these up- and down-regulated genes 
in the transcriptome was determined and the result is shown 
in Figure 4. The top two categories for up- and down-regulated 
genes belong to the categories "general function prediction, 
[R]" (equivalent to unclassified) and "inorganic ion transport 
and metabolism, [P] ." A closer examination of the distribution 
chart reveals that for the up-regulated genes, the categories "sig- 
nal transduction mechanisms, [T]," "amino acid transport and 
metabolism, [E]," and "energy production and conversion, [C]" 
(which is important for the energy-consuming process for biofuel 
end-product production), were also the most well-represented 
categories in the transcriptome, with the number of induced 
genes above 200 for each category (Figure 4). 

An important category of COG for the degradation and uti- 
lization of lignocellulosic substrate is "carbohydrate transport and 



metabolism, [G]," which includes primarily cellulosome-related 
genes. In this category, the number of up- and down-regulated 
genes is 175 and 46, respectively (Figure 4). These differentially 
expressed genes are described in detail in later section. 

VALIDATION OF RNA-Seq RESULTS WITH QUANTITATIVE REVERSE 
TRANSCRIPTION-PCR 

Quantitative reverse transcription-PCR (qRT-PCR) is a well 
accepted method for verifying microarray (Ferreira et al., 2010; 
Yang et al., 2012) and RNA-Seq data (Cusick et al, 2012; Huang 
et al., 2012; Ji et al., 2013). We used this method to validate the 
expression patterns of 12 selected genes in independent biological 
replicates. The selected genes mainly represented different func- 
tional categories involving in cellulosome, hydrogen and ethanol 
production and had a range of FC values based on RNA-Seq. 
In addition to these 12 genes, the gene RecA was chosen as a 
reference gene to normalize the real time RT-PCR data because 
it has been commonly used as reference gene for C. thermocel- 
lum (Stevenson and Weimer, 2005). As a further confirmation 
for the use of this gene as reference gene, RecA did not differ 
in expression in our cell samples grown on two substrates, the 
values being RPKM 1166 for PYP-grown cells vs. RPKM 1237 
for cellobiose-grown cells. The primers for all 13 genes, along 
with one-by-one comparisons of the fold-changes in expression 
of each gene as measured by RNA-Seq and qRT-PCR, are listed 
in Supplementary Data Sheet 3, section 1. Most of the qRT-PCR 
data matched the RNA-Seq based FC values with a correlation 
coefficient of 0.95 for the set of 12 selected genes, which indi- 
cated that our RNA-Seq result is accurate and the conclusion from 
RNA-Seq should be reliable. 

TRANSCRIPTIONAL CHANGES OF CELLUL0S0MAL COMPONENTS 

Cellulosome is an extracellular supramolecular machine that can 
efficiently degrade crystalline cellulosic substrates and associated 
plant cell wall materials. We are especially interested in the genes 
encoding cellulosomal component proteins in response to cellu- 
losic substrates, which can lead to the identification of regulatory 
or rate-limiting components regarding cellulolysis. 

The number of reported cellulosomal genes in the genome of 
ATCC 27405 strain has been increasing from the initial numbers 
of 71 (Zverlov et al., 2005) and 72 (Zverlov and Schwarz, 2006), to 
more recently, 81 genes (Raman et al., 2011). Note that the latter 
number is consistent with the updated genome annotation. Of 
these 81 genes, all were detected transcriptionally in this study. 
The list of 81 cellulosomal genes with their FC values is described 
in Table 2. 

Analysis of this sub-transcriptome showed the following fea- 
tures: first, the overall cellulosome-associated genes were up- 
regulated significantly in PYP- vs. cellobiose-grown cells, reflected 
by the facts that out of the 81 cellulosomal genes, 47 (i.e., 
58%) were up-regulated with FC > 2.0. In contrast, only 4 out 
of 81 (i.e., about 5%) cellulosomal genes, including a GH5 
gene (Cthe_2193), CelU (Cthe_2360), XghA (Cthe_1398), and 
Cthe_0438, were down-regulated by two-fold or detected in 
celllobiose-grown cells only. The overall average FC value for all 
these 81 cellulosomal genes is 3.5 (see the bottom row in Table 2), 
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B: Chromatin structure and dynamics 
C: Energy production and conversion 

D: Cell cycle control, cell division and chromosome partitioning 

E: Amino acid transport and metabolism 

F: Nucleotide transport and metabolism 

G: Carbohydrate transport and metabolism 

H: Coenzyme metabolism 

I: Lipid metabolism 

J: Translation, ribosomal structure and biogenesis 
K: Transcription 

L: DNA replication, recombination, and repair 



M: Cell wall/membrane/envelope biogenesis 
N: Cell motility and secretion 

0: Posttranslational modification, protein turnover, chaperones 
P: Inorganic ion transport and metabolism 

Q: Secondary metabolites biosynthesis, transport, and catabolism 

R: General function prediction 

S: Function unknown 

T: Signal transduction mechanisms 

U: intracellular trafficking and secretion 

V: Defense mechanisms 

Z: Cytoskeleton 



FIGURE 4 | COG functional categories of the 1211 up- and 314 down-regulated genes in PYP-grown cells compared with cellobiose-grown cells. Note 
that since COG annotation classes overlap, the sum of COG annotated genes is larger than the number of total up- and down-regulated genes analyzed. 



suggesting the whole cellulosome machinery was "geared up" at 
the late stationary phase on PYP substrate. 

Secondly, the primary scaffoldin, CipA, and the main sec- 
ondary scaffoldins, OlpB, Orf2P, and SdbA, have shown signifi- 
cant up-regulation and were found to have the highest abundance 
on the transcriptional level. This implies that the celluloso- 
mal system is crucial for the efficient degradation of pretreated 
PYP. Our data further verifies the notion that the C. thermo- 
cellum cellulosome is the main contributor to the extremely 
high activity observed in cellulose degradation. Additionally, one 
putative scaffoldin gene, Cthe_0736 (cellulosome anchoring pro- 
tein) has been up-regulated as much as 2.8 times, i.e., FC 2.8 
(Table 2, row 31). OlpC, which has been recently identified as 
an important outer layer protein - cellulosome anchoring pro- 
tein cohesin subunit (Pinheiro et al., 2009), was also found 
to be up-regulated in this study (Cthe_0452, FC 4.1; Table 2, 
row 19). 

Thirdly, the major cellulosomal cellulases, such as CelS 
(Cthe_2089, exo-, FC 4.5; RPKM 3848— the most abundant cellu- 
losomal transcripts in PYP-grown cells), CelA (Cthe_0269, endo-, 
FC 1.7, RPKM 1885 — the third most abundant transcript in 
PYP-grown cells), CbhA (Cthe_0413, exo/endo, FC 7.5, RPKM 
367), and Cell24A (Cthe_0435, endo-; FC 2.0; RPKM 401) were 
remarkably up-regulated, as shown in Table 2. Exo- and endo- 
refer to the substrate site upon which the GHs act. While exo- 
cellulases remove one or more sugar units from the ends of 



polysaccharide chain, endoglucanases randomly hydrolyze the 
internal glycosidic bonds of polysaccharides. Among the above 
listed enzymes, CelS was reported to display high synergy with 
the endo-Cell24A (Bras et al., 2011). Our observation that the 
CelS had the highest RPKM value in PYP-grown cells is consis- 
tent with a report that a knockout mutant of this gene showed 
a ~60% reduction in cell cellulolytic performance (Olson et al., 
2010), and is also consistent with a most recent finding that CelS 
gene was found to be highly expressed in C. thermocellum cells 
grown on both pretreated switchgrass and cottonwood substrates 
(Wilson et al., 2013a). This study furthered such transcriptional 
analysis by showing that the transcript of CelS is not only highly 
abundant, but also significantly responsive to biomass substrate, 
as its RPKM level in cellobiose-grown cell was 4.5 times lower 
(Table 2). 

In addition, there was one cellulosome dockerin type I gene 
(Cthe_0438) for which the transcript was detected only in 
cellobiose-grown cells, with a low RPKM value of 4 (Table 2, 
row 81). This observation is consistent with a previous report 
regarding its absence in the sub-proteome of cellulosomes from 
cells grown on pretreated switchgrass (Raman et al., 2009). The 
domain architecture of Cthe_0438 is "DUF843-type I dockerin" 
(DUF indicates domain of unknown function). It is difficult to 
classify the association of this gene with any of the specific cat- 
alytic enzyme types, cellulases or hemicellulases, and thus this 
issue may warrant further studies. 
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Table 2 | List of 81 cellulosomal genes with their RPKM and fold change (FC) values determined in PYP- vs. cellobiose-grown C. thermocellum 
cells. 

Row no. Gene ID Protein GH, CBM family RPKM of PYP cells RPKM of cellobiose cells FC 
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The table is sorted in the order of FC values. The glycoside hydrolase (GH) and carbohydrate-binding module (CBM) families were determined according to the 
annotations in GenBank and the CAZY database (www.CAZY.org). Other abbreviations: Cbh, cellobiohydrolase; CBO, transcript that detected only in cellobiose- 
grown cells; CE, carbohydrate esterases; Cel, cellulase; Chi, chitinase; CipA, cellulose-integrating protein A; Lie, lichinase; Man, mannanase; n/a, not applicable; OlpB, 
outer layer protein; ORF2p, open reading frame 2p; PL, polysaccharide lyase; Prot, protein; SdbA, scaffoldin-dockerin binding component A; XghA, xyloglucanase 
Xgh74A; Xyn, xylanase. 



EXPLORING THE CORRELATION BETWEEN RNA-Seq AND PUBLISHED 
PR0TE0MIC DATA FOR CELLUL0S0MAL GENES 

Literature reports have described the quantitative sub-proteomic 
data of cellulosomes extracted from cell-free culture filtrates of 
C. thermocellum grown to late stationary phase on cellobiose, 
Avicel, and other cellulosic substrates (Gold and Martin, 2007; 
Raman et al, 2009). In one study (Gold and Martin, 2007), an 
"emPAI," defined as the exponentially modified protein abun- 
dance index, showed a linear relationship with protein concen- 
tration and was normalized to the value obtained for CipA. 
Similarly, in another study (Raman et al., 2009), the normal- 
ized spectral abundance factor (NSAF) represented the number 



of spectral counts divided by the number of amino acid residues 
in the protein and was also normalized to the value obtained 
for CipA. To determine whether or not a correlative relation- 
ship exists between these published cellulosomal sub-proteomic 
data and the RNA-Seq RPKM data in the present study, we first 
retrieved the emPAI/CipA and NSAF/CipA data from the litera- 
ture (Supplementary Data Sheet 3, section 2), and then plotted 
it against the Log2(RPKM) data for the genes encoding the same 
proteins in this study. The results showed that despite the fact 
that the plotted data were obtained from three different research 
groups, there remain strong correlations between the RNA-Seq 
RPKM and the protein abundance indicators of emPAI/CipA and 
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FIGURE 5 | Correlation between the observed gene abundances in this 
study and the published protein abundance data of cellulosomal 
components. (A) Correlation between Log2(RPKM) and emPAI/CipA. (B) 
Correlation between Log2(RPKM) and NSAF/CipA. The literature sources 
for data of emPAI/CipA and NSAF/CipA are described in the Results and 
Discussion section. 



NSAF/CipA, with Pearson correlation coefficient values being 
0.68 and 0.76, respectively (Figures 5A,B). 

NON-CELLULOSOMAL GLYCOSIDE HYDROLASE PROTEINS AND "FREE" 
CELLULASES 

In addition to the GHs existing in cellulosomes, the C. ther- 
mocellum genome includes non-cellulosomal genes coding for 
GHs, among which 12 were up-regulated by more than two- 
fold (Table 3). Notably, both CelC (Cthe_2807) and Licl6A 
(Cthe_2809), the two members in the putative CelC- GlyR3- 
Licl6A operon (Newcomb et al., 2011), were up-regulated and 
predicted to be extracellular and/or bacterial cell wall associated 
(Table 3). This was consistent with a recent report that genes in 
this operon were mainly expressed in the stationary phase (very 
little during exponential phase) of pure cellulose (Avicel) fermen- 
tation (Raman et al, 2011). In addition, Licl6A contains CBM54 
and a tandem of four CBM4s, in which both types of CBMs are 
able to bind xylan; as well as cellulose (Dvortsov et al, 2010). 
Remarkably, the grouping of four CBM4s in this protein has 
an ~ 100-fold higher binding constant for xylan and cellulose than 
that of a protein with a single CBM4 module (Dvortsov et al, 
2012). 

It is surprising that Cel9I (Cthe_0040), an important non- 
cellulosomal processive endoglucanase that could digest crys- 
talline cellulose with high efficiency, showed low abundance 



and FC value (RPKM 62 in PYP-grown cells, FC 1.3; see 
Supplementary Data Sheet 1, row no. 1875). Similarly, the only 
non-cellulosomal exo-cellulase CelY (Cthe_0071, exo-) also 
showed both low abundance and FC value (RPKM 61, FC 1.4, see 
Supplementary Data Sheet 1, row no. 1741), which is consistent 
with a recent report that knocking out the CelY gene had no sig- 
nificant impact on the cellulolytic capacity of the strain (Olson 
et al., 2010). Based on the evidence above, it would seem that 
the free-enzyme system plays a less important role than does the 
cellulosome system in the degradation of PYP by C. thermocellum. 

RESPONSIVE SIGMA FACTORS COUPLED WITH 
MEMBRANE-ASSOCIATED ANTI-SIGMA FACTORS AS A MECHANISM 
FOR BI0SENSING BI0MASS SUBSTRATES 

Because COG analysis had revealed that the category of "sig- 
nal transduction mechanisms, [T]" was among the most rep- 
resented categories in the transcriptome of PYP-grown cells, 
we checked the genes related to the signal transmission 
attributed to sensing cellulose and other polysaccharide sub- 
strates. Recently, it was proposed the possible role of seven 
membrane-associated Rsgl-like anti-sigma factors (referred to as 
Rsgl) and their sigma factor partners (referred as Sigl) in extra- 
cellular carbohydrate-sensing and glycosidase gene regulation in 
C. thermocellum (Kahel-Raifer et al, 2010; Nataf et al., 2010). 
In their proposed model, Rsgl senses the presence of cellulose 
and other biomass components in the extracellular medium via 
its CBM domains, whereas Sigl mediates the intracellular acti- 
vation of different glycosidase genes (Kahel-Raifer et al., 2010; 
Nataf et al., 2010; Bahari et al, 2011). These predicted gene 
pairs include: Sigll- Rsgll (Cthe_0058- Cthe_0059), crI2-RsgI2 
(Cthe_0268- Cthe_0267), crI3-RsgI3 (Cthe_0315- Cthe_0316), 
aI4-RsgI4 (Cthe_0403- Cthe_0404), oT5-RsgI5 (Cthe_1272- 
Cthe_1273), aI6-RsgI6 (Cthe_2120- Cthe_2119), and a24C- 
Rsi24C (Cthe_1470- Cthe_1471). Among them, three Sigl genes 
(Cthe_0058, Cthe_0268, Cthe_0403) were up-regulated during 
later stages of pure cellulose (Avicel) fermentation (Raman et al, 
2011). 

In this study, while the transcripts of all seven pairs of genes 
had been detected (see Supplementary Data Sheet 1), three pairs 
of genes and an extra Sigl-Rsgl pair (which was not included in 
the initial literature prediction shown above) were up-regulated 
with FC values above 3.0 in the PYP- against cellobiose-grown 
cells at the late stationary phase, as described below: 

(1) oT3-RsgI3 (Cthe_0315- Cthe_0316), with FC 4.2, and 5.9, 
respectively. According to literature, RsgI3 contains a PA14 
dyad domain that target pectin (Nataf et al., 2010). The dif- 
ferential expression of this pair was not previously reported 
in cellulose-grown C. thermocellum. 

(2) aI4-RsgI4 (Cthe_0403- Cthe_0404), with FC 3.1, and 3.9, 
respectively. RsgI4 contains CBM3 that binds cellulose. This 
observation supports previous observation of this pairs being 
involved in the time course of pure cellulose fermentation by 
C. thermocellum (Raman et al., 2011). 

(3) a 24C -Rsi24C (Cthe_1470- Cthe_1470), with FC 4.0 and 5.9, 
respectively. Note that Rsi24C contains a module that resem- 
bles GH5 that target cellulose based on literature (Nataf et al, 
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Table 3 | Twelve non-cellulosomal GH genes that were up-regulated by two or more folds in pretreated yellow poplar (PYP)- compared with 
cellobiose-grown C. thermocellum cells. 



Gene ID 


Protein and GH family 3 


FC 


Cellular location prediction 13 


Cthe_1256 


GH3; bgIB; p-glucosidase B 


28.6 


n/a 


Cthe_1428 


GH1 ; bgl 


9.1 


Cytoplasmic (Psortb) 


Cthe_2744 


GH23; lytic transglycosylase 


7.5 


Membrane (UniProt) 


Cthe_2809 


GH16, CBM4, CBM54; f3-1,3-1,4-glucanase, Lic16A 


6.0 


Cell wall (Psortb) Extracellular (Psortb) 


Cthe_1471 


GH5; Rsi24C 


5.9 


Membrane (Bahari et al., 2011) 


Cthe_2807 


GH5; CeIC, endoglucanase C 


5.0 


Extracellular (Psortb) 


Cthe_0043 


GH9, CBM3; CeIN, Cel9N 


4.4 


Membrane (Psortb) 


Cthe_0322 


GH3 


4.3 


Extracellular (Uniprot) 


Cthe_2191 


a-glucan branching enzyme; GH13 


3.3 


Membrane (Burton, 2010) 


Cthe_2548 


GH51, a-N-arabinofuranosidase 


3.3 


Cytoplasmic (UniProt) 


Cthe_1221 


GH94, GT84; glycosyltransferase 


3.0 


Membrane (Psortb) 


Cthe_0795 


GH13, CBM34; a-amylase, catalytic region 


2.2 


Cytoplasmic (Psortb) 



a GH family characterization was based on CAZy database (http://www.cazy.org/). 

b The cellular location was based on three sources: literature, UniProt database (http://www.uniprot.org/), or prediction by Psortb program 
(http://www.psort.org/psortb/index.html). All listed genes were statistically significantly up-regulated (p < 0.05). n/a, not available. 



2010). Similar to aI3-RsgI3, the differential expression of 
this pair was also not previously reported in cellulose-grown 
C. thermocellum. 
(4) Equally interesting, we found that another Sigl-Rsgl 
pair (Cthe_2521-Cthe_2522) was detected on tran- 
scriptome, with FC values of 3.9 and 4.8, respectively 
(Supplementary Data Sheet 1). The protein of sigma factor 
Cthe_2521 had been detected in the proteome of C. ther- 
mocellum cultured on cellobiose (Rydzak et al., 2012), but 
was not in the initial 7 pairs of Sigl-Rsgl proposed for 
polysaccharide signal transmission in the literature (Nataf 
et al, 2010). The possible role of this pair warrants further 
investigation. 

GENES RELATED TO CELL0DEXTRIN TRANSPORT AND 
PHOSPHORYLATION 

C. thermocellum has been reported to use ABC-type trans- 
porters for uptake of oligosaccharides derived from cellulose 
hydrolysis (Strobel et al., 1995), which is an important energy- 
conserving mechanism by which importing long cellodextrins 
can reduce the cost of transport as one-ATP molecule is con- 
sumed per transport event. So far, four cellodextrin ABC trans- 
porters (carbohydrate-binding protein CbpA, B, C, and D) have 
been characterized for their substrate binding features (Nataf 
et al, 2009). Whereas CbpA (Cthe_0393) binds only to cel- 
lotriose (G3), CbpB (Cthe_1020) binds to G2-G5 cellodextrins, 
and CbpC (Cthe_2128) and D (Cthe_2446) bind to G3-G5 
cellodextrins (Nataf et al, 2009; Rydzak et al., 2012). Several 
transcripts of these annotated genes have been detected in the 
transcriptome of C. thermocellum (Raman et al., 2011; Riederer 
et al., 2011). Previously, six cellodextrin ABC transporter genes 
(Cthe_1862, Cthe_0391-0393, and 1019-1020, including CbpA 
and CbpB) were found to be expressed at high levels throughout 
the course of Avicel alone fermentation (Raman et al., 20 1 1 ) . Most 
recently, Cthe_039 1-0393 were found to be highly expressed on 
both pretreated switchgrass and cottonwood substrates (Wilson 



et al., 2013a). This study furthered such transcriptional anal- 
ysis by comparing the gene expression between PYP- and 
cellobiose-grown cells at late stationary phase. A total of 12 
transcripts of cellodextrin ABC transporters have been detected 
(Supplementary Data Sheet 3, section 3; also Figure 6), among 
which, seven were up-regulated in PYP- against cellobiose-grown 
cells, with FC values in the range of 3.1-10.3, suggesting that the 
PYP-grown cells had a mechanism for enhancing the uptake and 
utilization of polysaccharides derived from biomass substrates. 

For the subsequent phosphorolytic cleavage of the imported 
oligosaccharides and cellobiose, this study identified the tran- 
scripts of one cellodextrin phosphorylase (CDP, Cthe_2989) and 
two cellobiose phosphorylases (CEP, Cthe_0275 and Cthe_1221). 
Among these, the CEP Cthe_1221 was up-regulated with a 
FC value of 3.0 (Supplementary Data Sheet 3, section 3; also 
Figure 6), suggesting this gene may warrant further studies. 

GENES RELATED TO GLYCOLYSIS, PYRUVATE CATAB0LISM, AND 
END-PRODUCT SYNTHESIS 

The deduced pathway for cellulolysis, glycolysis, ethanol, and 
H2 production in C. thermocellum ATCC 27405 is illus- 
trated in Figure 6, which is in accordance to accumu- 
lated literature findings (Demain et al., 2005; Rydzak et al., 
2009, 2011; Riederer et al, 2011; Carere, 2013). The set of 
gene IDs for the enzymes involved in above pathway were 
retrieved from the KEGG PATHWAY database (http://www. 
genome.jp/kegg) (Kanehisa et al., 2008), and were shown in 
Supplementary Data Sheet 3, section 3 with their FC values in 
PYP- vs. cellobiose-grown cells. Out of the listed genes, 18 
and 12 were significantly up- and down-regulated, as shown 
in red and green text in Supplementary Data Sheet 3, section 3, 
respectively. 

For pyruvate catabolism and end-product synthesis, C. ther- 
mocellum may convert pyruvate into (1) formate via pyruvate 
formate lyase (PFL) and pyruvate formate lyase activating enzyme 
(PFL-AE), (2) lactate via lactate dehydrogenase (LDH), (3) C0 2 , 
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FIGURE 6 | Diagram for the primary steps in the conversion of cellulose 
to fermentation products using C. thermocellum ATCC 27405. Sigl-Rsgl 
(sigma factors coupled with membrane-associated Rsgl-like anti-sigma 
factors) are also illustrated in this diagram as they have been proposed in 
literature for polysaccharide triggered signal transmission in regulation of GH 
family genes. Detailed nomenclature of enzymes and their associated gene 
IDs can be found in Supplementary Data Sheet 3, section 3 and Table 4. 
For enzymes associated with a single gene ID, the gene ID is not included in 
the figure; instead, its fold change (FC) value is indicated directly beside the 
enzyme. For enzymes associated with multiple gene IDs, the gene IDs and 



their FC values are illustrated in sorted lists in the diagram, in order of either 
their FC values or locus IDs. Text in red and green represents the 
up-regulated and down-regulated genes, respectively, in PYP- against 
cellobiose-grown cells; in contrast, text in black represents the genes with no 
significant transcriptional changes between the two types of cells, i.e., 2.0 > 
FC value > 0.5. Nomenclature of metabolites: CoA, coenzyme A; ECH, 
energy conserving hydrogenase; Fd, ferredoxin; Fru, fructose; Glu, glucose; 
ox, oxidized; P, phosphate; PEP phosphoenolpyruvate; Pi, inorganic 
phosphate; PYPO, transcript that was detected in PYP-grown cells only; Pyr, 
pyruvate; red, reduced. 
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Table 4 | C. thermocellum genes encoding putative hydrogenases, 
sensory hydrogenases, and NADH:Fd oxidoreductases using 
ferredoxin and NAD(P)H as electron carriers. 

Locus ID Gene name FC 



TYPE 1. [NiFe] H 2 ase 

Fd-dependent ECH: Cthe_3013-Cthe_3024 



Cthe_3013 


Hydrogenase expression/formation protein HypE 


1.6 


Cthe_3014 


Hydrogenase formation HypD protein 


1.4 


Cthe_3015 


Hydrogenase assembly chaperone hypC/HupF 


0.8 


Cthe_3016 


[NiFe] hydrogenase maturation protein HypF 


0.8 


Cthe_3017 


Hydrogenase accessory protein HypB 


0.9 


Cthe_3018 


Hydrogenase expression/synthesis HypA 


0.6 


Cthe_3019 


4Fe-4S ferredoxin, iron-sulfur binding 


0.5 


Cthe_3020 


Ech hydrogenase subunit E (Fd H2ase) 


0.3 


Cthe_3021 


Ech hydrogenase subunit D 


0.4 


Cthe_3022 


Ech hydrogenase subunit C 


1.5 


Cthe_3023 


Ech hydrogenase subunit B 


0.8 


Cthe_3024 


Ech hydrogenase subunit A 


0.5 




Type 2.1. Bifurcating: Cthe_0338-Cthe_0342; Cthe_0428-Cthe_0430 


Cthe_0338 


NADH-dependent hydrogenases 


0.9 


Cthe_0340 


Ferredoxin 


0.9 


Cthe_0341 


NADH dehydrogenase (ubiquinone) 


1.2 


Cthe_0342 


[FeFe] H 2 ase (NADH H 2 ase) 


1.3 


Cthe_0428 


NADH dehydrogenase (ubiquinone) 


6.4 


Cthe_0429 


NADH dehydrogenase (ubiquinone) 


4.4 


Cthe_0430 


[FeFe] H2ase 


7.9 


Type 2.2. Sensory: Cthe_0425-Cthe_0426 


Cthe_0425 


Putative hydrogenase 


9.9 


Cthe_0426 


[FeFe] hydrogenase 


11.0 


Type 2.3. I\IAD(P)H dependent: Cthe 3003-3004 


Cthe_3003 


[FeFe] H2ase, putative 3 


1.5 


Cthe_3004 


Ferredoxin 


0.6 


OTHER RELATED GENES 


Cthe_0335 


Hydrogenase large subunit- like protein 


3.2 



In addition, hydrogenase maturation proteins are also listed. The table is 
arranged according to the hydrogenase classification described by two literature 
sources, as cited in the related Results and Discussion section; within each 
sub-category, it was arranged in order of locus IDs. For each gene, fold change 
(FC) value was calculated by dividing the RPKM of PYP cells by RPKM of 
cellobiose cells, as listed in Supplementary Data Sheet 1. RPKM, reads per 
kilobase of exon model per million mapped reads. Text in black represents 
the genes with no significant transcriptional changes between the two 
types of cells, i.e., 2.0 > FC value > 0.5. Text in red and green represents 
the genes that found to be significantly up-regulated and down-regulated, 
respectively, in PYP- against cellobiose-grown cells, based on statistic analysis 
in Supplementary Data Sheet 2. ECH, energy conserving hydrogenase. PYPO, 
transcript that detected in PYP-grown cells only. 

3 Cthe_3003 (IFeFel fyase, putative) was characterized as Fd-linked 
by Schut and Adams (2009), but as NAD(P)H dependent recently by 

Carere et al. (2012). 



Fd re a, and acetyl-CoA, in which acetyl-CoA eventually leads 
to the production of acetate and ethanol, the Fd rec j leads to 
the formation of hydrogen (Figure 6). The up-regulation of 
two out of four alcohol dehydrogenase (ADH) genes, namely 
Cthe_0423 (FC value 7.0) and Cthe_0394 (FC value 5.3), raise 
the average FC value for ADHs to 3.6. In contrast, the phos- 
photransacetylase (PTA; Cthe_1029), whose activity had been 
verified in C. thermocellum (Lamed and Zeikus, 1980), was 
significantly down-regulated in PYP- against cellobiose-grown 
cells, with an FC value of 0.2. These data suggest that based 
on mRNA profiling, carbon flux of acetyl-CoA may preferen- 
tially be channeled to ethanol production in PYP-grown cells. 
The observed decrease of acetate/ethanol ratio, from 1.08 in 
cellobiose-grown cells to 0.92 in PYP-grown cells (Table 1), sup- 
ports this carbon shift at the late stationary phase of PYP-grown 
cells. 

Lactate and formate were not monitored in this study as 
previous study has shown that in the late growth phase of C. ther- 
mocellum cultures grown on cellobiose substrate, lactate, and 
formate together represent only a small fraction of the total end 
products produced (Islam et al, 2006). Future studies should also 
monitor the production of lactate, formate and CO2, which could 
provide a whole picture for carbon balance for the cells utilizing 
biomass substrates. 

DYNAMICS FOR H 2 PRODUCTION WITH PYP AS SUBSTRATE 

C. thermocellum genes encoding putative hydrogenases and sen- 
sory hydrogenases using ferredoxin and NAD(P)H as electron 
carriers are listed Table 4. In addition, hydrogenase maturation 
proteins are also listed. The classification of the above hydroge- 
nases are mainly based on literatures that systematically charac- 
terized the putative hydrogenases in C. thermocellum and other 
species (Schut and Adams, 2009; Carere et al, 2012). Briefly, there 
are two types of hydrogenases (Leases) according to the metal 
content in the respective active sites: 

The first type is [NiFe] F^ase, a putative Fd-dependent 
energy converting hydrogenase (ECH). Its hexameric structural 
subunits are encoded by Cthe_3019-Cthe_3024, with assem- 
bly of its active site assisted by a suite of maturation pro- 
teins encoded by Cthe_3013-Cthe_3018 (Table 4, Figure 6). This 
study showed that the FC values for 7 of the 12 detected 
Fd-H 2 ase and related genes (Cthe_3013-Cthe_3015, Cthe_3017- 
Cthe_3018, and Cthe_3022-Cthe_3023) were in the range of 
0.6-1.6, suggesting an unchanged status at the transcrip- 
tional level for these genes. However, the remaining five genes 
(Cthe_3019-Cthe_3021, Cthe_3023-Cthe_3024) encoding the 
Fd-H2ase structural subunits were significantly down-regulated 
with FC values between 0.3 and 0.5, which might account for the 
observed lower H2 yield in PYP-grown cells, compared to the con- 
trol cellobiose-grown cells. This finding implies that the [NiFe] 
ECH H2ase likely functions in the H 2 production direction since 
its down-regulation led to less H2 accumulation in the culture 
headspace. 

The second type is [FeFe] H2ases that include two bifurcat- 
ing H2ases: both being trimeric, and encoded by Cthe_0338- 
Cthe_0342 and Cthe_0428-Cthe_0430, respectively. Only seven 
genes are listed in Table 4 because Cthe_0339 is annotated as 
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histidine kinase in GenBank protein database. The second type 
also includes a sensory H2ase (Cthe_0425-Cthe_0426) and a 
NAD(P)H dependent H 2 ase (Cthe_3003-Cthe_3004), as listed in 
Table 4. The data showed that the FC values for the trimeric 
bifurcating H 2 ase genes (Cthe_0338, Cthe_0340-Cthe_0342) and 
the NAD(P)H dependent H 2 ase genes (Cthe_3003-3004) were in 
the range of 0.6-1.5, indicating an unchanged status for their 
transcription (Figure 6). A homolog of this trimeric H 2 ase has 
been uncovered in Acetobacterium woodii which functions in H 2 
consumption yielding reduced Fd and NAD(P)H (Schut and 
Adams, 2009; Hess et al., 2013a,b). In contrast, genes encoding 
the trimeric bifurcating H 2 ase (Cthe_0428-Cthe_0430) were sig- 
nificantly up-regulated in PYP-grown cells, with FC values of 
6.4-7.9 (Figure 6). Based on literature (Schut and Adams, 2009; 
Calusinska et al., 2010), the trimeric [FeFe] H 2 ase identified 
in C. thermocellum is a putative bifurcating hydrogenase. This 
cytoplasmic enzyme was initially characterized in Thermotoga 
maritima and uses both reduced ferredoxin and NAD(P)H as 
substrates, and functions in H 2 production (Schut and Adams, 
2009; Hess et al, 2013a). Yet the up-regulation of Cthe_0428- 
Cthe_0430 (this work) led to decreased H 2 production when 
cultured in PYP- vs. cellobiose-grown cells. As such, the exact role 
of the C. thermocellum trimeric H 2 ase is unknown, which makes 
it hard to link the above transcriptional data to the observed 
lower H 2 yield in PYP-grown cells. Further studies are warranted 
to investigate the exact direction (production or consumption 
of H 2 ) for each of the above [FeFe] H 2 ases. 

In addition, the sensory H 2 ase encoded by Cthe_0425- 
Cthe_0426 had a FC value of 9.9 and 1 1.0, respectively (Table 4), 
and future studies are also needed to explore the implication of 
the significant up-regulation of these two genes in PYP-grown 
cells. 

GENES FOR DEGRADING HEMICELLULOSE AND LIGNIN IN FEEDSTOCK 
BI0MASS 

We noticed that some important hemicellulase genes, such as 
XynY (Cthe_0912, FC 13.8), XynD (Cthe_2590, FC 2.2), XynZ 
(Cthe_1963, FC 1.4), ManA (Cthe_2811, FC 3.4) were up- 
regulated in PYP- against cellobiose-grown cells (Table 2; cel- 
lulosomal proteins). In addition, some auxiliary enzymes such 
as a-N-arabinofuranosidase (Cthe_2548, FC 3.3; Table 3, non- 
cellulosomal protein) were also up-regulated. This is probably 
caused by the adaptation of the strain to digest the PYP, of which 
the chemical composition and structure are much different from 
those of cellobiose. In contrast, we did not identify the primary 
genes involved in lignin modification and degradation, such as 
laccases, lignin peroxidases, or manganese peroxidases. 

GROWTH PHASE AND THE TRANSCRIPTOME OF CELLS 

The main focus of this study was to investigate the effects of 
different carbon sources (PYP vs. cellobiose) on the transcrip- 
tome of the cells. To attain this goal, the selection of sampling 
point was crucial. The experimental design of this study in 
choosing the late stationary phase is in accordance with litera- 
ture (Gold and Martin, 2007; Raman et al, 2009). One merit 
for choosing this sampling point is that our transcriptomic 
data can be cross-analyzed with the literature's protein data of 



cellulosomes extracted from cellobiose-grown cells, which are 
used as one means of validating the transcriptomic data in this 
study. 

Another merit in choosing this sampling point is that com- 
pared with other early growth stages (exponential or early sta- 
tionary phases), the levels of simple, soluble sugars in PYP- and 
cellobiose-grown culture are both very low: due to either the limit 
of bacterial enzymes in releasing sugars from PYP or the deple- 
tion of cellobiose, respectively. In this sense, late stationary phase 
is relatively more suitable phase than other phases for comparing 
transcriptomes derived from different carbon substrate-grown 
cells in this study. 

Nevertheless, it is noteworthy that different growth phases 
may cause some shifts to some metabolic pathways; thus future 
studies comparing cells grown on cellobiose vs. cells grown 
on woody biomass at the exponential phase would provide 
another angle to investigate the effects of biomass substrate 
on the transcriptomes of C. thermocellum cells, which may 
lead to the identification of both growth phase- and biomass 
substrate-responsive genes. In addition, the transcriptional pro- 
files of these two cell samples could also be affected by the 
distinct concentrations of carbohydrate sources as well as the 
residual concentrations of metabolites/nutrients present at time 
of sampling, which will remain a challenge for future stud- 
ies in managing these variables — some of which are nearly 
uncontrollable. 

CONCLUSIONS 

We conducted a RNA-Seq analysis of the transcriptional pro- 
files of C. thermocellum grown on PYP and cellobiose, which is 
different from previous transcriptional studies that focused on 
degrading cellulose alone substrates, or sub-proteomic studies 
that focused on cellulosomal protein components. We found that 
nearly 60% of the genes encoding the protein components of the 
cellulosomes — the core machinery for cellulose degradation — 
were up-regulated, whereas only 5% were down-regulated. The 
top up-regulated cellulosomal genes, along with the responsive 
SigTRsgl and cellodextrin transporter genes, present promis- 
ing candidate genes for engineering C. thermocellum strains 
to improve their capacity in effectively converting lignocellu- 
losic biomass substrates. Furthermore, the identified differentially 
expressed NAD(P)H H 2 ase, ADH, and PTA genes may provide 
insight into how the cells regulate the production of H 2 and 
ethanol under the carbon-limited condition. 
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